Goto

Collaborating Authors

 cross-validation confidence interval


Cross-validation Confidence Intervals for Test Error

Neural Information Processing Systems

This work develops central limit theorems for cross-validation and consistent estimators of the asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for k-fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller k-fold test error than another. These results are also the first of their kind for the popular choice of leave-one-out cross-validation. In our experiments with diverse learning algorithms, the resulting intervals and tests outperform the most popular alternative methods from the literature.


Cross-validation Confidence Intervals for Test Error

Neural Information Processing Systems

This work develops central limit theorems for cross-validation and consistent estimators of the asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for k-fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller k-fold test error than another. These results are also the first of their kind for the popular choice of leave-one-out cross-validation. In our experiments with diverse learning algorithms, the resulting intervals and tests outperform the most popular alternative methods from the literature.


Review for NeurIPS paper: Cross-validation Confidence Intervals for Test Error

Neural Information Processing Systems

Weaknesses: Some major comments: 1) The connection to algorithmic stability is interesting, but I am not convinced that this can deliver as strong results as we would like beyond what can already be achieved through standard results/analysis. More specifically, algorithmic stability has mostly shown O(1/n) results for ERM or SGD, but this is just a rehashing of standard results, essentially following from iid-ness, that is, that every datapoint contributes the same information on average. This is not a problem with the current paper per se, but more a critique of algorithmic stability analysis. Rather, my concern for the current paper is twofold: a) the connection to algorithmic stability cannot deliver, as far as I understand, any stronger results than what is already possible through standard methods; b) and thus a basic CLT for CV error is attainable through a more standard analysis. Indeed, the path to asymptotic normality is pretty straightforward in the paper, since all important steps are more-or-less assumed: Square integrability of mean loss \bar h_n, song convexity of such loss function which guarantees O(1/n) rates, etc. 2) The experimental setup is very confusing to me.


Review for NeurIPS paper: Cross-validation Confidence Intervals for Test Error

Neural Information Processing Systems

The reviewers were all rather positive about the theoretical contribution, although one minority negative review (R1) gave a low score due an the experimental setup deemed unconvincing. Overall I recommend acceptance, possibly asking the authors to make some revisions to the experimental section to address some criticisms of R1.


Cross-validation Confidence Intervals for Test Error

Neural Information Processing Systems

This work develops central limit theorems for cross-validation and consistent estimators of the asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for k-fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller k-fold test error than another. These results are also the first of their kind for the popular choice of leave-one-out cross-validation. In our experiments with diverse learning algorithms, the resulting intervals and tests outperform the most popular alternative methods from the literature.